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Many systems in nature, society and technology can be described as networks, where the vertices 
are the system's elements and edges between vertices indicate the interactions between the corre- 
sponding elements. Edges may be weighted if the interaction strength is measurable. However, the 
full network information is often redundant because tools and techniques from network analysis do 
not work or become very inefficient if the network is too dense and some weights may just reflect 
measurement errors, and shall be discarded. Moreover, since weight distributions in many complex 
weighted networks are broad, most of the weight is concentrated among a small fraction of all edges. 
It is then crucial to properly detect relevant edges. Simple thresholding would leave only the largest 
weights, disrupting the multiscale structure of the system, which is at the basis of the structure of 
complex networks, and ought to be kept. In this paper we propose a weight filtering technique based 
on a global null model (GloSS filter), keeping both the weight distribution and the full topological 
structure of the network. The method correctly quantifies the statistical significance of weights 
assigned independently to the edges from a given distribution. Applications to real networks reveal 
that the GloSS filter is indeed able to identify relevant connections between vertices. 



PACS numbers: 89.75.-k 

I. INTRODUCTION 

A popular way to look at a complex system is turning 
it into a graph, or network, by highlighting the fundamen- 
tal elements of the system (vertices) and the interactions 
between them (edges connecting vertices), possibly with 
their strength (weights on edges) . Due to the recent avail- 
ability of massive data sets and computational facilities 
capable to process them, many networked systems have 
been carefully investigated in the last years [HZ]. 

A recurrent property is the heterogeneity in the distri- 
butions of the main structural features of such systems. 
These include purely topological attributes, like the num- 
ber of neighbors of a vertex (degree) [8, 9 as well as vari- 
ables depending on the weighted character of the edges, 
like the edge weights and the sum of the weights of the 
edges incident on a vertex (strength) [TO]. Such hetero- 
geneity is responsible for peculiar properties of complex 
networks, like their high robustness against random at- 
tacks or failures [TT]. Weights and topology are by no 
means independent, revealing a set of non-trivial rela- 
tionships [10]. For this reason it is improper to separate 
weights from topology and to study the system by ex- 
ploiting either source of information. 

However, keeping the full information about the net- 
work can give rise to problems. A large network with a 
high edge density may be intractable by traditional tools 
of network analysis. For instance, it may be impossible to 
produce a meaningful visualization of the network. Also, 
a high edge density is a serious obstacle for graph clus- 
tering techniques |12j . most of which rely on the work- 
ing assumption that the network is sparse, i.e. that the 
number of edges is not much larger than the number of 
vertices. Other analysis tools may not be applicable due 
to their high computational complexity. In addition, the 



estimates of the edge weights may be biased by measure- 
ment errors, so the connections between some pairs of 
vertices might not be meaningful. 

For all these reasons, it is important to develop suit- 
able techniques to reduce the network, by maintaining 
only the most valuable information. The problem of in- 
formation reduction in datasets has a long tradition and 
has led to the design of very popular methods, like Princi- 
pal Component Analysis |13j . For networked data a well 
known strategy is coarse graining [14HT7] , which consists 
in grouping vertices based on their mutual similarity or 
topological role in the network and replacing each group 
with super-vertices. Here, instead, we wish to preserve 
all vertices and act only on the edges, by selecting the 
most relevant ones. This is a major challenge. For one 
thing, it should be clarified what "relevant" means, as 
this is not straightforward. In fact, several options are 
possible, depending on the features of the system that 
shall be preserved. Since edge weights are usually broadly 
distributed, keeping just the largest weights is a viable 
option, since a few edges account for most of the total 
weight. All weights lower than a predefined threshold 
could be then erased [T8 22 j. However, global threshold- 
ing has two drawbacks. On the one hand, it introduces 
a scale in an originally multi-scale system. On the other 
hand it may spoil important topological properties. For 
instance, it may fragment the network into a large col- 
lection of components. To avoid that, one may construct 
a maximum spanning tree |23) . where as many edges as 
possible are removed such to maintain the connectedness 
of the graph and to keep the largest possible total weight 
on the remaining edges. This traditional technique is 
also not ideal, as it reduces the network to an acyclic 
graph (a tree), whereas cycles are very important struc- 
tural features of complex networks. Moreover, a tree has 
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a number of edges equal to the number of vertices mi- 
nus one, and it is unlikely that the number of relevant 
edges simply depends on the number of vertices, for any 
system. Tumminello et al. have shown that many more 
edges/information can be kept, by extracting a subgraph 
that can be embedded on a surface of genus k, instead of 
a tree [53] . 

Still, selecting edges with a systematic bias towards 
the largest weights would destroy the heterogeneity in 
the distribution of edge weights, which is a crucial fea- 
ture of complex weighted networks. Furthermore, this 
could significantly modify the coupling between weights 
and topology. Meanwhile there are a few methods capa- 
ble to filter the information on the edges such to respect 
the multiscale structure of complex weighted networks. 
Such techniques include a two-stage algorithm proposed 
by Slater [551 US] an d a method by Glattfelder and Bat- 
tiston |27j based on a multilevel network analysis. In 
recent works by Serrano et al. 28, 29] the focus is on 
the immediate neighborhood of each vertex. For a given 
vertex, the weights on its adjacent edges are analyzed, 
and those edges carrying a significant fraction of the to- 
tal strength of the vertex are picked. The significance of 
the weight is estimated from the so-called disparity func- 
tion, that results from a simple null model stating how 
weights are distributed among the edges incident on the 
vertex. Here we focus on the edges, i.e. on pairs of con- 
nected vertices, rather than on the individual vertices. 
Unfortunately, it is not possible to treat pairs of con- 
nected vertices independently of the rest of the network, 
as they are attached to other vertices, etc. The natural 
solution is a global null model, that accounts for the full 
topology of the network, while preserving the heterogene- 
ity of the weight distribution. In this paper we propose 
the Global Statistical Significance (GloSS) filter, which 
satisfies these constraints. 

At variance with other techniques, the GloSS filter 
yields a well defined global p-value for all edge weights 
of the network. Furthermore, it correctly identifies situ- 
ations in which all edges are equally relevant/irrelevant, 
like when weights are independently and identically dis- 
tributed on the edges. Finally, the performance of the 
GloSS filter on several real networks, both directed and 
undirected, is compared with that of other filtering tech- 
niques. 



II. RESULTS AND DISCUSSION 

A. The GloSS filter 

The starting point is the weight matrix W, whose ele- 
ment Wij indicates the weight of the edge joining vertices 
i and j. If there is no edge/interaction between i and j, 
Wij = 0. The number of neighbors of vertex i is its de- 
gree ki. We also recall that the strength [TO] Si of vertex 
i is the sum of the weights of the edges incident on i: 



nections of the original network are locked, while weights 
are assigned to the edges by randomly extracting values 
from the observed weight distribution P obs (w). This null 
model thus preserves both the topology and the weight 
distributions of the original network, by construction. 

Suppose that we want to evaluate the statistical signif- 
icance, according to this null model, of the edge between 
vertices i and j, with observed weight Wij. The degrees 
and strengths of i and j are ki , kj, Si and Sj . This can be 
formalized by means of a Bayesian approach. The prob- 
ability to observe weight w\j 7^ on the edge, given the 
degrees and strengths of its endvertices, reads 



P (sj, Sj \wij, ki, kj ) 
P (sj, Sj \ ki, kj ) 



P (Wij j Sj, ki, Sj, kj ) — Pobs 

0; 

The denominator on the right hand side is a normaliza- 
tion factor, while P b s {wij) is a well defined number. In 
order to estimate the term in the numerator we must 
take into account that Wij, ki, kj are given and so the 
" free" variables contributing to Sj and Sj are the weights 
of the remaining ki — 1 and kj — 1 connections of vertices 
i and j, respectively. These weights can be treated as 
independent random variables in the null model, with 
the only restrictions that X^fc^j w ik = Si — and 
Y] k uijk = Sj — uiij. This implies that 



P (si, Sj \wij, ki, kj ) 



F( Si 



Uij , kj 



l)x 

■1) 



(2) 



The function F (s, k) is the probability of randomly ex- 
tracting, from the weight distribution P obs (w), k ele- 
ments whose sum is equal to s, which means that 

F {s, k) = J dxiP obs (x x ) J P obs (x 2 ) dx 2 ■ ■ ■ 

■ ■■ J dx k P obs (x k ) 5 (xi + x 2 + ■ ■ ■ + x k - s) ' 

where the Dirac delta 6 (x\ + . . . + x k — s) ensures the 
satisfaction of the constraint on the vertices' strength. 
We remark that, if either i or j (or both) has degree 1, 
Eq. [2] as it stands, would not be defined. Here the whole 
strength of i (or j) would come from the edge ij, so the 
probability distribution of observing that weight is just 



a (^-function centered at 



hi, 



since no other values are 



compatible with the strength of the vertex (sij = w^ if 
kij — 1). 

Finally, the statistical significance (or p-value) ay of 
the observed edge weight w^ can be computed by calcu- 
lating the integrals 



P (> Wij \si, ki, Sj, kj ) 



Our null model is a graph where the con- 



r°° dw P bs( w ) P(si ,Sj\w,ki,kj ) 

^jj ' 

Jq 00 dw P t,s ( w ) P{ s i->Sj \ w ,ki ,kj ) 

(4) 

Despite its apparently high complexity, the computation 
of the significance level can be carried out numerically in 
a fast and accurate way. The probability function F (s, k) 
can in fact be viewed as a multiple convolution integral of 
the weight distribution function and its computation may 
be performed by invoking the convolution theorem. First 
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the Fourier transform of the weight distribution is calcu- 
lated, then its A:-th power; the final answer is obtained 
by computing the Fourier antitransform of the result (see 
details in Appendix A) . The extension of the former pro- 
cedure to directed networks is straightforward. If Wij 
denotes the weight of the directed edge going from ver- 
tex i to vertex j, it is sufficient to substitute in the former 
equations fc, and s, with k° ut and s° ut , respectively. For 
vertex j one ought to replace kj with fc™ and Sj with s™. 
Once the p-value of each edge has been determined, we 
can establish a certain threshold and deem the edges as 
significant if their p- values lie above that threshold. This 
procedure defines what we have called the GloSS filter. 



B. Tests on random weight distributions 

Ideally, any filtering procedure should be able to 
recognize situations in which there are no significant 
weights. For instance, given a distribution, we could as- 
sign weights taken from that distribution on each edge, 
independently of the other edges. In this way, the distri- 
bution of the weights on the edges would be random, with 
no correlations with topological features. Therefore, the 
fluctuations of the weights coming from such distribution 
are just the expected fluctuations of the distribution it- 
self, whose statistical significance is exactly indicated by 
the p- value a of Eq. Q. The probability P (< a) for an 
observed weight to have a p-value a or lower is then ex- 
actly equal to a, as all p-values are equally probable. In 
Fig.[l]we show the profile of P (< a) on random networks 
with power law distributions of degrees and weights, with 
exponents 7 and /3, respectively. The four panels corre- 
spond to different choices of 7 and /?. For high values of 
the exponents (like 7, j3 = 100) the power law distribu- 
tion is effectively exponential. In all cases we see that the 
GloSS filter recovers the expected relation P (< a) = a 
(diagonal continuous line), which indicates that indeed 
weights are randomly distributed among the edges and 
there are no significant fluctuations. The Disparity filter 
by Serrano et al. |28| . instead, displays a different pro- 
file (dashed line). For actual power law distributions of 
weights (Figs. 1A and 1C), it yields the expected pat- 
tern up to a p-value of about 0.4, then it deviates from 
it. In particular, for the case of exponential distribu- 
tions of weights (Figs. [IJ3 and[lj)), all observed weights 
have essentially the same p-value a ~ 0.4 (yielding the 
approximate step function for the cumulative displayed 
in the figure). In this case the values of the weights are 
quite close to each other, and the method has problems 
to distinguish between them. We remark that, even if 
edge weights are quite homogeneous here, once their dis- 
tribution is defined one can always assign to each weight 
a proper likelihood (p- value), and discuss about its com- 
patibility with the chosen distribution. The different re- 
sults obtained with the Disparity filter are due to the 
different null model adopted by this filter, which is lo- 
cal. However, at variance with the GloSS filter, it is not 
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Figure 1: (Color online) Cumulative distribution P (< a) 
of the significance level a for independent identically dis- 
tributed weights. Networks are made of N = 1 000 vertices 
and have minimum degree equal to 5. Connections among 
vertices are randomly drawn by preserving the a prion given 
degree sequence. Vertex degrees and edge weights are ran- 
domly chosen from the power law distributions P (k) ~ fc~ 7 
and P (w) ~ kj - ' 3 , respectively. Statistical significance of 
weights, for different choices of 7 and /3, are computed with 
the GloSS filter (continuous curve) and the Disparity filter by 
Serrano et al. 28 (dashed curve). 



possible to build a network based on the null model of 
the Disparity filter, just because of its local character. It 
is only possible to restrict the picture to the subgraph 
consisting of a node and its incident edges. 



C. Tests on real networks 

Here we show some applications of our filtering proce- 
dure to real weighted networks. First we focus our atten- 
tion on the most "significant" weights of the network. For 
this purpose we take the World Trade Web (WTW) [30], 
i.e. the network of trade relationships of world countries. 
Vertices represent the countries and edges are directed 
and weighted by the money flow running from any two 
countries to the other (import/export). The WTW is 
very useful to study propagation of economic crises and 
has been thoroughly investigated in the last years [3UH52"] . 
Data are freely available [531 [33]. The data we considered 
refer to the year 2006: the network has 189 vertices and 
12 705 edges. In Fig. [2] we show the 50 most significant 
edges, selected with the GloSS (left) and the Disparity 
(right) filter, respectively. We see that the results are 
quite different, even if some of the edges coincide. In 
particular, the GloSS filter is more likely to capture con- 
nections involving smaller/poorer countries than the Dis- 
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Figure 2: (Color online) Top 50 connections of the World Trade Web in year 2006: GloSS filter (left), Disparity filter (right). 
Countries without edges are removed from the picture. 
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Table I: List of the top 20 most relevant connections of the World Trade Web according to the GloSS (left) and the Disparity 
filter (right), respectively. The weights are evaluated in millions of dollars. The edges selected by the Disparity filter carry on 
average much larger weights and have far lower p- values than those picked by GloSS. 



parity filter, which selects more frequently larger coun- 
tries and trade exchanges. This is manifest in Table |TJ 
where we list the top 20 edges, along with their weights 
and p- values. 

Interesting economic relations, revealed as anomalous 
by the GloSS filter, are those between China and North 
Korea and also those relating China to Togo, Burkina 
Faso and Benin. While the existence of an anomalous 
connection between China and North Korea can be ex- 
plained in terms of simple political reasons, the relations 



of China with the African countries have deeper eco- 
nomic foundations based on agreements on trade, eco- 
nomic and technological cooperation. Particularly rel- 
evant economic relations are also those established be- 
tween Australia and Papua New Guinea, between Italy 
and Albania and between France and Gabon. Papua 
New Guinea became independent from Australia only in 
1975, but its economic development is still controlled by 
Australia. After the collapse of communism in Albania 
(1991), a mass exodus of refugees moved to Italy. Albani- 
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Figure 3: (Color online) Applications of filtering techniques on two real weighted undirected networks: a network of US 
senators (left) and Zachary's karate club (right). For each network we show the size of the largest connected component and 
the heterogeneity parameter of the degree (fc), strength (s) and edge weight (w) distributions as a function of the number of 
edges added to the system (in decreasing order of relevance) . The continuous line stands for the results of the GloSS filter, the 
dot-dashed line for those of the Disparity filter, the dashed line for global thresholding. In addition, we show scatter plots of the 
edge rankings estimated by the GloSS filtering technique and the other two considered here: Disparity and global thresholding. 



ans form nowdays one of the largest foreign communities 
in Italy and strong trade relationships are present be- 
tween the two countries. Gabon was a colony of France 
up to 1960, but still maintain exclusive political and eco- 
nomic relationships with France. 

We now proceed with a more systematic study of the 
importance of the selected weights for the structure of the 
network. Since the goal is to reduce the information of 
the system by keeping as many as possible of its features, 
one may wonder how many edges, picked in descending 
order of significance, are necessary to reproduce the most 
important features of the original weighted graph. For in- 
stance, how many edges are needed to form a connected 
graph? This test has been suggested in Ref. [25]. In ad- 
dition, we wish to check when the distributions of the 
vertex degrees, vertex strengths and edge weights are re- 
stored. Since it is hard to verify the match of two dis- 
tributions, while it is far easier to compare two numbers, 
we limit the comparison to an important property of a 
distribution, the heterogeneity parameter, expressing the 
dispersion of the distribution around its average. For a 
variable x with a certain probability distribution, the het- 
erogeneity parameter is defined as the ratio of the second 
moment of the distribution by the square of the first mo- 
ment: (x 2 ) / (x) 2 . Our tests consist then in adding edges 



until the heterogeneity parameters of the distributions 
of the reduced network reach those of the original net- 
work and remain stable until the last edges are added. 
In Appendix [B] we use an alternative measure for the 
comparison of distributions: the Kullback-Leibler diver- 
gence [36J. We carried out the tests by using three dif- 
ferent filtering techniques: GloSS, Disparity and global 
thresholding. We also compare the rankings produced by 
our filtering method with those obtained with the other 
techniques to estimate their correlation. 

We start with two undirected graphs: a network of US 
senators [35] and the karate club network of Zachary [37J . 
The first is a network with 99 vertices, corresponding to 
members of the 109th Senate of the United States that 
served for the full two-years term. The weight of the 
edge between a pair of senators is weighted by the num- 
ber of times they have voted in the same way (the total 
number of edges is 4 851). The data are freely available 
from http://voteview.com. Naturally, senators of the 
same party (Republican or Democratic) are more likely 
to vote together than senators of different parties. Con- 
sequently, the distribution of edge weights is bimodal, 
with two groups of values corresponding to edges joining 
Republican or Democratic senators and to edges joining 
Republicans to Democrats. Zachary's karate club net- 
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Figure 4: (Color online) Applications of filtering techniques on two real weighted directed networks: the US airport network 
(left) and the World Trade Web (right). The panels are analogous as those of Fig. |3j although those relative to the degree 
and strength distributions are split to account for the two possible edge directions (incoming and outgoing). The continuous 
line stands for the results of the GloSS filter, the dot-dashed line for those of the Disparity filter, the dashed line for global 
thresholding. 



work consists of 34 vertices and 78 edges, corresponding 
to the members of a karate club in the USA and their 
social relationships. It has become quite popular lately 
as it is frequently used as benchmark to test algorithms 
for community detection [T3]. In Fig. [3] we show the re- 
sults of our analysis of both graphs. The performances 
of the GloSS and Disparity filters are rather similar. For 
the senators network we see that after adding about 40% 
of the edges the reduced network acquires the features 
of the original one. In this case, there is a strong cor- 
relation between the GloSS filter and global threshold- 
ing when it comes to selecting the most relevant edges. 
This is due to the fact that the senator network is almost 
fully connected and its weight distribution is bimodal 
(as opposed to the typically broad distributions observed 
in many systems). Under the null model assumption of 
random assignments of weights (from the given bimodal 
distribution), the larger weights between members of the 
same party are more likely to be deemed relevant by the 
GloSS filter. 

Finally, we discuss applications to directed networks. 
We take four datasets: the World Trade Web (WTW), 
the air transportation network of the USA, the Florida 
Bay ecosystem in the dry season |38j and a network of 
commuting in the UK. The WTW has been described 
at the beginning of this subsection. Data on the US 



air transportation network can be downloaded from the 
Bureau of Transportation Statistics (US government) 
(http://www.bts.gov). Vertices are US airports and 
edges are weighted by the number of passengers trans- 
ported along the corresponding routes in the year 2000. 
Our network has 664 vertices and 15 132 edges. The food 
web of Florida Bay entails the trophic interactions be- 
tween species, weighted by carbon transfers from one 
species to another. The network has been constructed 
within the ATLSS Project of the University of Maryland 
(http://www.cbl.umces.edu/atlss.litml) The species 
are 125, their interactions 1 969. The network of com- 
muting is composed of 376 vertices, representing local 
authorities, geographical divisions covering the territories 
of England and Wales. Each of the 72 954 directed edges 
corresponds to a flow of commuters between the local 
authority of origin and that of destination with a weight 
accounting for the number of commuters per day. The 
data come from the 2001 UK census, where the local au- 
thority of residence and of work/study is registered for a 
significative part of the British population. The database 
can be accessed online at the site of the Office for Na- 
tional Statistics |http : //www . ons . govTu k/ census . 

In Fig. 0| we show the results of our analysis for the 
WTW and the US airport network, following the same 
scheme as in Fig. [3] The results for the food web and the 
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Figure 5: (Color online) Applications of filtering techniques on two real weighted directed networks: the food web of Florida 
Bay in the dry season and the commuting network between cities in UK. The panels report the same analyses as those of Fig. [4] 
The continuous line stands for the results of the GloSS filter, the dot-dashed line for those of the Disparity filter, the dashed 
line for global thresholding. 



network of commuting are reported in Fig. [5] We remark 
again a substantial similarity between the GloSS and the 
Disparity filter. This seems to be odd, as the two filtering 
procedures are very different in their selection of the most 
significant edges, as we have shown in Fig. [2] and Table|l] 
What emerges from Figs. [3j [2] and [5] is that if a sizable 
fraction of edges arc picked, both filters select mostly the 
same weights, so after a while the reduced descriptions 
of the network would match or become very similar. On 
the other hand global thresholding is clearly inadequate 
to catch the main properties of the original network, for 
it requires many more edges to recover them, as already 
pointed out in Ref. [28] . 



We close the section by performing a study analo- 
gous to that reported in Fig. [T but for some of the real 
networks examined here (Fig. 61). For the GloSS filter 
(continuous lines) we find different patterns than that 
expected for the null model, in which all p-values are 
equally probable. Only for the US airports the p-values 
have roughly the same probability, up until a ~ 0.8. For 
Zachary's karate club, the WTW and the food web, there 
are significant differences with respect to the null model. 
The Disparity filter (dashed lines) displays a markedly 
different behavior: with the exception of Zachary's karate 
club, very low a values are much more frequent than 
found by the GloSS filter. 



III. CONCLUSIONS 

Filtering the information of complex weighted net- 
works is crucial both to detect the most relevant con- 
nections and to be able to process a system that is often 
too large for many analytical tools to work efficiently. In 
this paper we have presented the first filtering technique 
based on a consistent global null model, preserving both 
the distribution of edge weights and the full topology of 
the graph. The recipe is by no means unique and it would 
not be difficult to propose alternatives with slight mod- 
ifications of the main ingredients. In fact, filters are as 
arbitrary as the notion of "relevant information" is, so 
objective comparisons of different strategies are unfea- 
sible. Still, there are situations in which the answer of 
the filter is intuitive. For instance, if weights are indepen- 
dently and identically distributed among the edges, there 
should be no anomalous fluctuations and, consequently, 
the p-values of the edges should be homogeneously dis- 
tributed. We have seen that our GloSS filter indeed quan- 
tifies the correct statistical significance in such instances, 
while other techniques have problems. 

Tests on real weighted networks show that the GloSS 
filter is capable of subsuming the basic information about 
the system in a fairly small fraction of the edges, espe- 
cially the multiscale structure of both the topology and 
the edge weights. While we have put some emphasis on 
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Figure 6: (Color online) Cumulative distribution P (< a) of 
the significance level a for weights taken from the observed 
distribution of some of the real networks we considered. The 
continuous line corresponds to the GloSS filter, the dashed 
line to the Disparity filter. 



needed in order to be able to distinguish different weight 
values: if Sw is the minimum value of the difference 
among all pairs of unequal weights in the network, we 
set Q — |~log 2 (S/Sw)~\ and perform the linear binning 
of P bs (w) over b = 2® bins. We implement our filter- 
ing technique by calculating the Fourier transform of the 
weight distribution and all its powers up to k max . For 
each resulting expression we obtain the Fourier antitrans- 
form and finally compute the p- values of all edges accord- 
ing to Eq. Q. Fhe complexity of the various stages of 
our algorithm can be simply estimated: b log 2 (b) is the 
typical complexity for calculating the Fourier transform 
or antitransform; computing the powers of the Fourier 
transform requires a time which grows as b k 7nax 5 deriv- 
ing the inverse of the Fourier transform for each power 
scales as k max b log 2 (6); evaluating the statistical signif- 
icance for each of the M edges in the network goes as 
Alb. Since in general M 3> k max , the computational 
complexity of the whole filtering technique proposed in 
this paper is M b = M 2 Q . 



Appendix B: Matching the backbone and the 
original graph 



networks with heterogeneous distributions of features, we 
remark that our procedure is very general and it applies 
as well to cases in which distributions are peaked, as 
we have seen for the network of US senators. The sig- 
nificance of the edges is not so strongly correlated with 
their weights like for other techniques, so we are able to 
obtain potentially relevant information also from the ver- 
tices with low strength and degree and, consequently, a 
more balanced tradeoff between topology and weights. 

Therefore we believe that the GloSS filter is a valuable 
tool for the analysis of networked datasets. The proce- 
dure is implemented in a freely downloadable software 
( |http://f ilr ad . homelinux . org/ resources ) . 



In Section II C we have compared the distribution of 
local properties of the backbone with that of the original 
graph, to check how many edges are needed to reproduce 
the basic features of the graph at study. For this pur- 
pose we have compared the heterogeneity parameters of 
corresponding distributions as a function of the fraction 
of added edges. To give more robustness to our results, 
we consider here an alternative measure for the compar- 
ison of distributions, the Kullback-Leibler (KL) diver- 
gence [36], a well-known measure in information theory. 
The results are shown in Fig. [7| for four real networks. 
As we had found in Section III C there is little difference 



between the GloSS and the Disparity filters, while global 
thresholding follows slightly different trends. 



Appendix A: Numerical implementation of the 
GloSS filter 
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bins b must be a power of 2. The range of values we 
are interested in is [0,5*], where S — k max w max is the 
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